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Abstract 

Future Mars exploration missions will perform two 
types of experiments: science instrument placement 
for close-up measurement, and sample acquisition for 
return to Earth. In this paper we describe algo- 
rithms we developed for these tasks, and demonstrate 
them in field experiments using a self-contained Mars 
Rover prototype, the Rocky 7 rover. Our algorithms 
perform visual servoing on an elevation map instead 
of image features, because the latter are subject to 
abrupt scale changes during the approach. This al- 
lows us to compensate for the poor odometry that 
results from motion on loose terrain. 

We demonstrate the successful grasp of a 5 cm 
long rock over lm away using 103-degree field-of-view 
stereo cameras, and placement of a flexible mast on a 
rock outcropping over 5m away using 43 degree FOV 
stereo cameras. 

1 Introduction 

NASA is engaged in a series of missions designed to 
study the planet Mars. The current schedule calls for 
5 pairs of or biter /lander probes to be launched ap- 
proximately every two years, starting with the Maxs 
Pathfinder mission of 1997. The 2003 and 2005 mis- 
sions, in particular, call for a rover with the ability to 
traverse more them 1 kilometer away from its landing 
site, enquiring samples along the way. 

Autonomous robotic operations can greatly in- 
crease the science return of such planetary missions. 
As these operations become more adaptive, the bur- 
den of planning a sequence of motions is moved from 
the human operator to the onboard control system, 
allowing a greater number of targeted experiments 
to be achieved. In this paper we describe algorithms 
that allow a rover to autonomously approach and col- 
lect (or analyze) a sample at a human-specified target 



Figure 1: The Rocky 7 rover 


location. 

Our approach combines vision processing with ve- 
hicle and arm control. The target is identified in an 
image by a human operator, and its 3D location is 
computed onboard using stereo vision. A curved path 
toward the target point is planned, and executed in 
small steps. The shape of the terrain immediately 
around the target is used to reacquire the target at 
each step; we servo on the elevation map instead 
of image features, because the latter are subject to 
abrupt scale changes during the approach. This al- 
lows us to compensate for the poor odometry that 
results from motion on loose terrain, by visually reac- 
quiring the target at each step. Vehicle motion stops 
when the target appears within the workspace of the 
arm that will be used to grasp or study it. 

In the sections that follow', we survey related work 
that uses visual servoing to guide end-effector mo- 
tion, describe the general algorithm, and detail the 
experimental results from field tests performed on the 
Rocky 7 Mars Rover prototype (see Figure 1). 
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1. Acquire stereo image pair with body navigation cameras 

2. Send the left image over wireless network to host 

3. Scientist/Operator selects target rock on left image 

4. Target location and intensity threshold sent to rover 

All subsequent processing occurs onboard 

5. Identify 3-D location of rock based on calibrated camera models and onboard stereo image processing 

6. Compute single-arc rover trajectory to target 

7. Drive rover toward target 

8. Periodically (every 10 cm) poll the target tracking software to update target location using new stereo 
pair and current odometry 

9. Redirect rover toward the new target location using new single-arc trajectory, and repeat until target 
is within 1 cm of goal position. 

10. Deploy sampling arm and pick up rock. . 


Table 1: Algorithm for 

2 Related Work 

First described in [WSN85], visual servoing strate- 
gies incorporate vision sensing with the actuation of 
motors in a robotic system. Often simple image- 
processing filters are used to locate a target of in- 
terest, and knowledge of the camera system geome- 
try and manipulator kinematics are used to control 
motor current. This technique has been applied suc- 
cessfully to the active placement of a manipulator at 
high frame rates (e.g., in [HGT95], [PK93], [Nis90], 
and [THM+96]). In this application the distance of 
the target from the camera system usually remains 
the same, so the relative size of the object will re- 
main constant throughout the servoing process. 

In our case the entire robot, not just a manipulator, 
is being directed toward a goal point. Visual servo- 
ing for vehicle motion should be a useful tool, because 
the uncertainties introduced by motion over unknown 
terrain could potentially be eliminated by the visual 
tracking. However, as the vehicle approaches the tar- 
get, the target’s image size grows dramatically be- 
tween updates, and a correlation search on the inten- 
sity image tends to fail. Therefore approaches such 
as [WTB97] work well at long distances, but are less 
reliable at the final approach to the object. 

3 Approach 

The general problem we attempted to solve is the 
identification and collection of an interesting rock 
sample, in a control architecture that meets the con- 
straints of interplanetary operation. This latter re- 
quirement is summarized as follows: there will be a 
high latency in communication between the operator 
and rover (from 4 to 21 minutes one-way), and the 
number of messages sent must be minimized. For ex- 


small-rock acquisition 

ample, during Mars Pathfinder operations in 1997, 
logistical constraints on the Deep Space Network dic- 
tated that only two 5-minute communications win- 
dows were available each day. 

This general problem can be broken down into a se- 
ries of steps: Target Selection, Rover Motion toward 
the Target, Target Visual Reacquisition (these two 
steps might repeat a number of times), and Target 
Grasping. The first of these steps, Target Selection, 
is an extremely difficult task to automate, because 
it would require the rover to determine which sam- 
ples are scientifically interesting. We felt this was a 
task best left to scientists, and therefore designed our 
system to require a single round-trip transmission to 
allow a human scientist to perform it. We felt that 
the remaining steps could be made sufficiently robust 
to be implemented entirely onboard the rover. 

A summary of our algorithm for sample collection 
can be found in Table 1. The following subsections 
describe each component of the algorithm in detail, 
and refer back to the numbered steps in Table 1. 

3.1 Target Selection 

Target Selection is the first step of our sample ac- 
quisition process (steps 1-4 in Table 1). We assume 
the rover is already deployed in the area of interest, 
and has taken a stereo pair of images of the terrain 
in front of it. W T e transmit the left image from this 
stereo pair over the wireless network to a human op- 
erator who inspects the image, locates an interesting 
sample (a surface rock small enough to be grasped 
by the robot arm), selects it with the mouse, and 
transmits its image location back to the rover. Fig- 
ure 2 illustrates a sample target selection. This step 
requires one round-trip communication between the 
rover and operator. 




Figure 2: Sample target selection in Java GUI dis- 
play. The selected target is shown zoomed in. 

We found it necessary in later processing to 
segment out the rock from its background using 
brightness-based intensity thresholding. So in addi- 
tion to the image coordinates of the target rock, the 
operator communicates a brightness threshold and 
range to the rover (e.g., “pixels with 8bit intensity 
darker /lighter than 145 should be considered rocks”). 

3.2 Rover Motion toward the Target 

Next the rover performs computations and moves to- 
ward its target (steps 5-7 and 9 in Table 1). Once the 
rover receives the goal point in image coordinates, it 
uses stereo image processing and a geometric cam- 
era model to compute the (X,Y,Z) location of the 
target in the rover reference frame. Details of the 
JPL Stereo Vision algorithm can be found in [XM97]. 
Note that the goal location is stored in the 3-D rover 
reference frame, not a 2-D image frame. 

Having computed a location in world coordinates, 
a single arc is computed that should bring the rover 
close enough to the target that it appears within the 
workspace of the arm (see Figure 3). Our experi- 
mental arm had only 2 degrees of freedom, so it was 
important that the rover be positioned correctly to 
within a small tolerance, i.e., about 30% of the size 
of the 2 DOF gripper. 

The rover is then commanded to move a short dis- 
tance along the arc (10 cm or the remaining dis- 
tance to goal, whichever is smaller), and its position 



Figure 3: Single arc trajectory generation 


is reevaluated in the next step. 

3.3 Target Visual Reacquisition 

Having made partial progress toward the goal, the 
rover stops to evaluate its current position (step 8 
in Table 1). This update is initialized by subtract- 
ing the motion just taken from the target location 
in the rover frame. The motion just taken is esti- 
mated by computing vehicle odometry from wheel ro- 
tations. This is a very noisy estimate, because noth- 
ing is known about the surface on which the rover is 
moving; it could consist of pebbles, sand, sticky tar, 
or solid rock. 

A starting point in a fresh stereo image pair is com- 
puted from this new estimated location, and a small 
window around that point is searched in an attempt 
to locate the target. However, instead of searching 
the raw intensity image we automatically compute a 
range image from the stereo image pair, and search 
the resulting elevation map for the shape of the tar- 
get, rather than its visual appearance. In particu- 
lar, we assume that any target rock will be resting 
higher on the ground than its nearby surroundings, 
and lock in on the local elevation maximum as the 
new, refined 3D target point. We may not always 
achieve a completely dense elevation map from the 
range data, so before searching for the local maxi- 
mum we linearly interpolate any data missing from 
the range image. Given this dense, interpolated el- 
evation map, we start at the best estimate of the 
target location and “climb” to higher elevations until 
we reach a local maximum. 

Unfortunately, early experiments showed that on 




a sandy surface, the error in the odometry estimate 
was sufficient to cause this method to lose the tar- 
get. That is, the search window was centered too 
far away from the target rock for a simple gradient- 
ascent climb to recover it, even after relatively small 
motions. A general solution to this problem would be 
to incorporate more effective position and pose sens- 
ing and estimation into the rover. We anticipate that 
the work described in [Bal99] will provide such esti- 
mates and will be incorporated onboard the Rocky 7 
rover soon, but it was not available during the time- 
frame of our project. 

Instead, we took advantage of the fact that our tar- 
gets were visually distinct from the background sand, 
and used an intensity filter to focus attention in the 
elevation map. Given the search window centered at 
the (noisy) estimated target location, pixels in the im- 
age window are classified in one pass as either BACK- 
GROUND or ROCK according to the threshold value 
set by the operator. The ROCK pixel nearest the cen- 
ter of the search window is then treated as part of the 
target, and the enclosing blob of ROCK pixels are 
relabeled TARGET pixels. Finally, the centroid of 
all TARGET pixels is computed, and its range value 
(perhaps an interpolated value) is used as the starting 
point for the climb to the local elevation maximum. 
Using the centroid preserves the scale-invariance of 
our method. In fact, any pixel classification tech- 
nique can be used instead of brightness: on a flight 
mission one might use spectral filters to distinguish 
rocks from non-rocks, as in [PAW+98]. 

If no range data axe available, then no refinement 
is done, and the vehicle odometry' is assumed to be 
correct. 

The new target location is fed back into the Rover 
Motion toward Target step, and vehicle motion con- 
tinues until the target is found to be within the 
workspace of the arm. 


3.4 Target Grasping 

Finally, having determined that the target lies within 
the workspace of the arm, the arm is deployed and the 
target grasp is attempted (step 10 in Table 1). We 
use the difference between the actual and commanded 
trajectories from the motor encoders to tell when the 
arm makes contact with the target or ground, then 
close the gripper on the target. Instead of lifting off 
right away, we raise the arm a small amount and con- 
tinue to close the gripper until it stops, several times 
more. This redundancy helps ensure that the gripper 
has a good hold on the target. 


4 Experimental Results 

As testbed for these algorithms, we used the Rocky 7 
Mars Rover prototype [Vol99] (see Figure 1). Rocky 7 
is a 6-wheeled vehicle with rocker-bogey suspension 
and one set of steerable wheels. Batteries and so- 
lar cells provide about 50 Watts of power. A small 
2 DOF arm with 2 DOF gripper mounted on one 
side of the vehicle is used for digging and grasping 
rock samples, and an extendible 3 DOF mast pro- 
vides stereo image views from as high as 1.5 meters 
above the ground. For terrestrial work, communica- 
tion is via a 1 Mbit /sec wireless ethernet bridge or 
a 10 Mbit/sec coax hard line. Onboard processing 
consists of a 60 Mhz 68060 CPU running the Vx- 
Works 5.3 operating system in 16 megabytes of RAM. 
Vision sensors include three pairs of stereo cameras: 
one body-mounted pair faces the arm, smother body- 
mounted pair is on the other side of the vehicle, and 
the third pair is mounted near the end-effector on 
the extendible mast. All cameras are 480x512 CCD 
board cameras (but currently only half-resolution im- 
ages are used), and the body-mounted cameras have 
an effective FOV of 103 degrees, while the mast cam- 
eras have an effective FOV of 43 degrees. The body- 
mounted cameras are approximately 30 cm above the 
ground, point downward at an angle of approximately 
45 degrees, and are used primarily for detection of 
nearby obstacles. During these experiments the vehi- 
cle moved approximately 5 cm/sec and paused briefly 
during the image acquisition and path generation 
steps. 

We performed several experiments in JPL’s 
Mars Yard 1 , and successfully demonstrated the au- 
tonomous acquisition of small rocks (3-5 cm) located 
over 1 meter in front of the rover. Figure 4 shows a 
sample tracking sequence, with the target indicated 
in each frame by a dark square. Execution of the en- 
tire sequence (Target Selection, 8-10 iterations of 
Target Reacquisition, and successful Target Grasp- 
ing) typically completed within one minute when the 
target was just over 1 meter away. 

Many experiments were run, and 14 complete im- 
age/odometry datasets were collected. When run 
over these datasets, the visual tracker succeeded 
in maintaining target lock through 10 complete se- 
quences. Primary failure modes were due to abrupt 
intensity changes because of indoor lighting or rover 
shadow. All but one of the failures were corrected 
by simply re-running the visual tracker with a more 
appropriate intensity threshold; in the final failed se- 
quence the target was the same color as the back- 

1 http://marscam.jpi.nasa.gov/ 



Figure 4: Sample tracking sequence. 


ground. 

In general, failures can occur when: 

• The target leaves the camera FOV, so no range 
data is available and tracking depends entirely 
upon noisy odometry. 

• The target is visible, but no range data is com- 
puted. This can happen if the stereo optics are 
not properly set for current lighting conditions. 

• Multiple targets are visible in the search win- 
dow and odometry is poor. Additional filtering 
based on range data could alleviate this, as could 
matching based on more than a single shape fea- 
ture (i.e., not just the elevation maximum). 

• The target is visible but outside the search win- 
dow. This happens when the rover climbs over 
very hilly terrain, if the pose is not measured 
and used to predict the search window starting 
point. One could search again using revised mo- 
tion parameters, or improve the pose sensing. 

• Tracking is fine, but the rock is not picked up. 
This can occur if the rover gets stuck in a ser- 
voing loop, attempting to make small changes in 
position. On sandy soil, such maneuvering in- 
troduces much positional uncertainly. 

• The target is the same color as the background, 
so the intensity filter is irrelevant or misleading. 

4.1 Mast Placement 

This algorithm was also applied successfully to the 
placement of Rocky 7 s flexible mast arm on a rock 
outcropping. The limited degrees of freedom in 
Rocky 7’s mast dictate that the vehicle must face the 


target point’s tangent plane on the surface of a boul- 
der to enable complete coverage by the end-effector. 
For general targets (anywhere on the surface of a 
boulder) the surface normal is computed from the 
range data at closest approach, and a two- arc trajec- 
tory generated to ensure that the vehicle approaches 
the rock normal to the tangent plane of the target. 
However, since this algorithm servos on the local el- 
evation maximum, only targets on the tops of rocks 
were able to be specified. 

During several trials in the Mars Yard Rocky 7 
successfully tracked targets (the tops of boulders 20- 
50 cm tall) over 5 meters away using the 43-degree 
FOV stereo cameras in the mast head and success- 
fully placed the end effector on the target. For this 
application Target Reacquisition occurred after ev- 
ery 50 cm of motion. Execution of the entire sequence 
(Target Selection, 8-10 iterations of Target Reacqui- 
sition, and successful Mast Placement following the 
two-arc path generation) typically completed within 
four minutes when the target was just over 5 meters 
away. 

5 Future Work 

In the future we hope to reduce our dependence 
on the brightness-based filter by matching the en- 
tire shape of the terrain around the target (not just 
its peak) using the technique of [01s99], and by im- 
proving the position and pose estimates using visual 
feature tracking on the whole scene using a tech- 
nique from [Mat89]. These improvements should al- 
low tracking of targets anywhere on a rock, enabling 
a more general mast placement capability, and should 
also enable tracking of targets that leave the field of 
view. We would also like to be able to specify mul- 
tiple targets in a single image, and enable the rover 


to keep track of (and acquire) them accurately even 
if they leave the field of view of the cameras. 
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